object ChiSquareTest
Chi-square hypothesis testing for categorical data.
See Wikipedia for more information on the Chi-squared test.
- Annotations
- @Since( "2.2.0" )
- Source
- ChiSquareTest.scala
- Alphabetic
- By Inheritance
- ChiSquareTest
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @IntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @IntrinsicCandidate()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
test(dataset: DataFrame, featuresCol: String, labelCol: String, flatten: Boolean): DataFrame
- dataset
DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol
Name of features column in dataset, of type
Vector
(VectorUDT
)- labelCol
Name of label column in dataset, of any numerical type
- flatten
If false, the returned DataFrame contains only a single Row, otherwise, one row per feature.
- Annotations
- @Since( "3.1.0" )
-
def
test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame
Conduct Pearson's independence test for every feature against the label.
Conduct Pearson's independence test for every feature against the label. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared statistic is computed. All label and feature values must be categorical.
The null hypothesis is that the occurrence of the outcomes is statistically independent.
- dataset
DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol
Name of features column in dataset, of type
Vector
(VectorUDT
)- labelCol
Name of label column in dataset, of any numerical type
- returns
DataFrame containing the test result for every feature against the label. This DataFrame will contain a single Row with the following fields:
pValues: Vector
degreesOfFreedom: Array[Int]
statistics: Vector
Each of these fields has one value per feature.
- Annotations
- @Since( "2.2.0" )
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated