JASP: crosstabs and using filters and missing values to make t-tests work
2K views
Mar 7, 2025
We use the free JASP statistics software to do a t-test, then use filters and missing values to make the data make sense - and to make the independent samples t-test work (because it can only handle two values and our variable had eight!). Then we wrap with crosstabs (contingency tables) and a chi-square to illustrate what’s really happening better than the t-test can.
View Video Transcript
0:00
so let's jump right into jasp by doing a
0:02
t test and cross tab on the hug General
0:05
Social Survey and we'll open up an S
0:08
SPSS file of the General Social Survey
0:11
now I've already cut out about 100
0:13
variables from this and those were the
0:16
ones where we ask what country
0:18
somebody's from so we cut it down to a
0:20
reasonable size which is still quite
0:22
large now you might notice that there's
0:24
all these 110 here and that is because
0:28
in the merry way that this program made
0:30
it from the original data file to the
0:33
one that you see now the missing values
0:36
did not come across and they show up as
0:38
things like 110 so let's get a frequency
0:41
count just from a couple of sample
0:43
variables and we will see what we can do
0:46
about that so while it's waiting for the
0:48
descriptives engine to show up which is
0:50
a rather interesting delay that I think
0:52
is unique to this version of jasp the
0:54
missing values are basically things that
0:57
we arbitrarily say this is is something
1:00
that we don't want to deal with okay so
1:02
here you see how many valid answers
1:05
there are and then missing is zero but
1:07
we know that there are some people who
1:09
refuse to answer questions or were not
1:11
even ask the question and those are
1:13
people that we'd like to say are missing
1:16
that we don't have it so there are three
1:18
values for this I believe 110 was not
1:21
asked I think 115 was not applicable and
1:25
I think 120 was refused to answer and
1:27
there's also 105 which you see is not
1:30
really appropriate for work self do you
1:32
work for yourself or someone else these
1:34
are the appropriate replies you're
1:36
self-employed or you work for someone
1:38
else and these others are all missing
1:40
and they're 105 through 120 so let's go
1:43
through and we'll tell the
1:46
program in the
1:48
settings that
1:51
105
1:54
110
1:57
115 and
1:58
120 should be ignored and then we'll set
2:01
the current workspace with these values
2:03
which basically means apply but written
2:06
in the worst possible way and now you
2:08
see that we've got some missing values
2:10
and now these numbers make sense do you
2:12
work for yourself or uh do you work for
2:15
someone else you have about 11 1/2% and
2:18
about 88 1/2% and then 4% of the total
2:22
we say is missing it's not applicable to
2:24
us we want to pretend that it's not
2:26
there at all that's what missing means
2:28
and it's important to realize that other
2:30
you see here there's 199 others that's
2:33
not the same as missing that's something
2:36
that we want to keep so that is a very
2:39
convenient way to get rid of
2:41
those uh inappropriate values so now
2:45
let's go and we'll do an independent
2:46
samples T Test being very careful to
2:49
stay under classical statistics and not
2:51
travel into basian and we want to see if
2:55
doctors visit and what I'm doing here is
2:57
I'm typing the first few letters of the
2:59
variable and that select that finds it
3:01
in the list because otherwise I would be
3:03
here clicking through forever and you
3:05
would get really annoyed with me so this
3:08
is how often you make doctor's visits
3:10
how often you see an alternative doctor
3:13
and we'll just leave it at that we're
3:14
going to do T Test with both the
3:16
students test and the Welch test and the
3:19
reason for doing a Welch test as well is
3:21
because you don't have to worry about
3:23
whether the variances of the two groups
3:25
are s are the same so a student's te
3:28
test there's two formulas there is one
3:31
that's a simple formula that you use
3:33
when the two groups have the same
3:35
variation when the numbers bounce around
3:38
about the same for each group and then
3:40
Welch is for when they don't so let's
3:43
see we'll do some assumption checks to
3:45
make sure that we are matching the
3:47
necessary assumptions actually we're
3:48
going to leave out normality and just
3:50
look at equality of variances so that we
3:52
know which of these to use and the
3:54
grouping variable we're going to look
3:56
for whether people's parents let's see
4:00
we're born here parents born and now
4:03
here we see that there's uh nine levels
4:06
to parents born and the most at the
4:08
computer will allow us is two so let's
4:10
take a look at parents
4:17
born and we'll have to tell it to give
4:19
us frequency tables you normally do want
4:22
frequency tables and here we see what's
4:24
going on here both parents born in the
4:26
US about 71% of the sample nether born
4:30
in the US about 20% and then you've got
4:33
all of these and you don't really know
4:35
what's going on with these guys and
4:36
there's not that many of them Al
4:38
together so we want to exclude
4:41
them so now we're going to go into the
4:43
data and we're going to try to find
4:45
parents
4:46
born here we go parents born we
4:49
doubleclick here and here's all the
4:52
possible values that we have now there's
4:55
two ways to get to deal with this if I
4:57
was using SPSS or another Pro progr
5:00
where it's easy to create a new variable
5:03
and just say okay I'm going to compute
5:05
parents born 2 where zero continues to
5:08
be zero both born in the US eight
5:10
continues to be eight neither born in
5:12
the US and everything else is missing
5:14
that's what I would do in SPSS that
5:16
would be a clever way to do it however
5:18
if you've ever tried to compute a new
5:20
variable in jasp you will know why I
5:23
don't want to do that here so we've got
5:26
two choices we can either declare one
5:28
through seven to be missing or we can
5:30
filter them out
5:34
entirely by clicking on these check
5:37
marks and if there's an X it means that
5:39
we're taking out all the cases that have
5:41
them and this is the easier way to do it
5:43
if I wanted to set up missing values
5:45
just for this variable I could do that
5:48
too but then I'd have to laboriously go
5:50
1+ 2+ 3 plus Etc so we're not going to
5:55
do that so let's go back to
5:57
analyses and now mirac ully it should be
6:01
able to do our independent samples T
6:03
Test I think it's having a hard time
6:05
with alternate doctors so we're going to
6:07
take that out and we're just going to
6:09
look at doctor visits and here you see
6:12
the statistics but that doesn't tell you
6:14
what everything that you want to know we
6:16
also want to know the actual averages so
6:19
we have to click here on
6:21
descriptives but now let's take a look
6:24
the Assumption checks p is uh much
6:27
higher than 0.005 so we're going to say
6:30
that the variances are the same so we
6:32
can use the students T Test there's also
6:34
no Warning Sign Here normally uh let's
6:37
take out the Assumption
6:40
check if there was a problem here it
6:42
would give us a little asterisk and it
6:44
would tell us so we can take out the
6:46
Welches and just use students T Test and
6:49
now let's take a look at the averages
6:51
and we see that here where neither
6:53
parent is born in the US the average is
6:56
2.4 and where both parents are born it's
6:58
2.8 there's about 1,200 cases here 1
7:02
1200 people answered the questions so it
7:04
doesn't take much to get a significant
7:06
answer and here we see that the answer
7:08
is definitely significant less than once
7:11
in a thousand times we would expect to
7:13
see something this big do just random
7:16
chance and you see here that it is
7:18
significant there is a difference where
7:20
if both are born in the US it's a higher
7:23
mean and just to make sure that we
7:26
understand what having a higher mean for
7:27
that question is uh for doctor's visits
7:30
we can go through and we can look up
7:33
doctor's visits here this is normally
7:35
something that you do first I will admit
7:37
I did it before we started the video and
7:40
you can see that the higher you go the
7:43
more often people are seeing the doctor
7:45
and you can also see that of course if
7:47
you look at the actual data that's
7:49
really the best place to look at to to
7:51
see it so I'm just looking for doctor's
7:54
visits please hang up and try again and
7:58
here we have doctor's visits
7:59
so now when we double click on this you
8:01
can see the labels so it's one is never
8:04
two is seldom up to five is very often
8:06
so again the higher you go in numbers
8:09
the more often people see doctors so
8:11
what we're seeing is it's not a huge
8:13
difference so you can say that it is
8:16
significant but you can ask whether it's
8:17
really meaningful and for that you would
8:20
probably want to do something where you
8:23
look at a table showing you exactly
8:27
what's going on so you can say
8:29
contingency
8:31
tables and you can put in parents born
8:33
under columns and then we can put in doc
8:36
visit under rows you see that there's
8:39
the kai square is significant and what
8:42
that means is that there is some sort of
8:44
relationship between these two variables
8:46
which we knew from the T Test so let's
8:48
take a look at the column percentages
8:50
and here you see that for people both of
8:53
whose parents were born in the US the
8:55
largest group is sometimes followed by
8:59
cell them followed by often uh with the
9:01
neither born in the US 24% of people
9:04
neither of whose parents were born in
9:06
the US never see the doctor compared to
9:08
only about 15% of those whose parents
9:10
were born in the US so sometimes is
9:13
still about the same for both groups but
9:15
the never column is a lot bigger for
9:17
people neither of whose parents were
9:19
born here and the very often column and
9:21
the often column are higher for those
9:23
whose parents were both born in the US
9:26
so so that's maybe easier to explain to
9:28
people easier to see what the difference
9:30
is cuz averages are great for analysis
9:33
and they are very robust so comparing
9:35
averages is really good for seeing if
9:38
there is a difference but explaining
9:40
what's going on is very hard to do with
9:42
averages so if you can you can create a
9:45
table and they call these normally
9:47
they're called cross tabs this is an
9:50
easy way to see what's really going on
9:52
here and that's it for today
#Open Source