The Impact of Codemixing on Language Differentiation in Young Bilinguals

The phenomenon by which a lexical item or phrase from one language is inserted into another, known as codemixing , is common in adult bilingual communities around the world (Genesee & Nicoladis, 1997). In many types of immersion programs as well, codemixing is a common strategy for introducing target vocabulary. However, little research has been conducted on the precise impact that vocabulary exposure via codemixing may have on how the target item is encoded by child listeners – namely, how it is assigned to one language or another. Bilingual children (n = 10) between 3 and 6 years old were recruited to participate in this experiment, in which phonetically English- or Spanish- apparent nonwords were presented in the context of a “codemixed” or “non codemixed” se ntence and participants were asked to decide to which language the nonword belonged. Results demonstrated a considerable bias toward categorizing most of the nonwords as Spanish (the non-dominant language for all ten children), although the language in which the nonword was introduced also considerably impacted children’s judgments. While the nonword’s phonology appears somewhat influential in determining its language of origin, this was not as impactful as the overall linguistic context.

phonemes (Hume, 2008). Experiments in which participants were required to select one member of a minimal pair that differed only in VOT expand the influence of context even further to include syntactic and semantic considerations; for example, when presented with phonetically-ambiguous /ba / buy or /p h ai/ pie, participants who heard the target in the context of a sentence Mary hates the ___ were more likely to choose the noun candidate, while those who heard Mary hates to ___ chose the verb (Fox & Blumstein, 2016). Famously, the degree to which an acoustic candidate resembles a real word influences listeners' perception of ambiguous phonetic stimuli, known as the Ganong effect (Ganong, 1980).
With the above confounding variables in mind, bilingual children face a unique set of challenges when acquiring the phonemic inventory of each language, since a number of subtle acoustic differences must be perceived in order to be encoded in the lexicon, perhaps in spite of biasing factors such as syntactic context or resemblance to a previously-encountered item. The way that early linguistic input is processed must somehow account for the complete phonologies of both languages, including potentially divergent renderings of the same phoneme across both languages, while also preventing structural discrepancies between the languages from interfering with accurate perception. When it comes to codemixing, bilinguals may need to actively disregard contradictory cues from syntactic or phonetic contexts in order to accurately parse an inserted phrase or lexeme as belonging to the other language.
The way in which speech is processed may support or hinder this goal.
Several models have been proposed to explain specifically how phonetic input is processed by bilingual adults. Historically, several possibilities have been considered with regard to the degree of interaction between the two systems, including the possibility that bilinguals represented the phonologies of both their languages within a single system (known as the Unitary Language System Hypothesis (Volterra & Taeschner, 1978). However, it is now widely accepted that both languages' phonotactics, while represented separately, are activated in response to any acoustic input, and that while these systems are differentiated in bilinguals, they can and do interact when processing new input (Altenberg & Cairns, 44 1983;Fabiano-Smith & Barlow, 2010;Blumenfeld & Marian, 2011;Von Holzen & Mani, 2012;Rankin, 2013;Blanco-Elorrieta et al., 2018;López, 2019). Such empirical findings are readily accounted for in interactive models of language processing, such as McClelland and Elman's (1986) TRACE model or, more recently, the Bilingual Interactive Activation Plus (BIA+) model (Dijkstra & van Heuven, 2002).
Findings from a lexical decision task with German-English bilingual adults illustrate that the adult bilingual has two sets of phonotactics, and that both sets are made available when processing new lexical information. The two phonological systems may interact in a way that lengthens bilinguals' response times to lexical stimuli compared to monolinguals. These findings can be explained by interactive processing models, but not the Independence Hypothesis (Altenberg & Cairns, 1983). A more recent brain imaging study, during which adult bimodal bilinguals were asked to rapidly name items in English and/or American Sign Language, saw a difference in magnetoencephalography (MEG) results when participants were triggered to activate a language versus when they were asked to suppress a language, with increased activity in regions of the brain associated with executive function and selfregulation. This indicates that language suppression requires more cognitive effort than language activation (Blanco-Elorrieta et al., 2018), again supporting the interactive activation model. Evidence from adult bilinguals supports an interactive processing model over two independent linguistic systems; can the same be said for the bilingual child? One of the first milestones which distinguishes the child acquiring two languages from their monolingual peers is phonological: bilingual infants acquire the phonemic inventories of two languages at the same time -and with the same precision -that monolingual infants acquire the inventory of one (Pearson et al., 1993(Pearson et al., , 1995Petitto et al., 2001).
Newborns can already distinguish between languages of different rhythmical classes (e.g. English and Mandarin), as demonstrated by sucking-rate measurement; by four months old, this awareness expands to include languages of the same rhythmical class (Werker & Byers-Heinlein, 2008). At this point, they can also perceive voicing contrasts at the phonemic level (Eimas et al., 1971). By ten months old, both monolingual and bilingual babies show a preference for words and word lists containing phonotactically permissible items, demonstrating their sensitivities to syllabic structure and the constraints of each of their native languages (Werker and Byers-Heinlein). Variations in language-specific realizations of specific phonemes, such as voice onset time (VOT) for stop consonants, are established by two years and three months (2;3) old (Deuchar & Clark, 1996). It appears that 3-year-olds acquire complete phonemic inventories for both of their languages at the same rate that monolinguals acquire the inventory of one, with a relatively low level of interference between the two inventories (Fabiano-Smith & Barlow, 2010;Montanari et al., 2018).
However, these findings do not indicate whether the respective phonologies are stored separately or in a shared lexicon. For many years, one of the most widely accepted theories of bilingual language encoding came from Volterra and Taeschner, who argued that children acquire two languages from infancy in three stages: 1) storing vocabulary from both languages in a single, unified lexicon, then 2) distinguishing the lexicons but applying the same syntactic rules, and finally 3) distinguishing both the lexicon and syntax of each language to create two distinct linguistic systems (Volterra & Taeschner, 1978). The assumption that bilingual infants do not distinguish between their languages has been supported by various case studies (Ronjat, 1913;Burling, 1959;Schnitzer and Krasinski, 1994;Vihman 1985), some of which have also highlighted the acquisition of phonetic segments and attempted to identify an acquisition path for the phonologies of two or more languages. Through the longitudinal observation and analysis of a few bilingual toddlers, these authors conclude that bilingual infants and toddlers do not initially categorize individual phonemes as belonging to one language or another; rather, they begin with a single unitary system that later separates into distinct languages at around two years of age. Deuchar's (1999) finding that bilingual toddlers mix function words more often than content words could also be interpreted as further support that a bilingual child's linguistic systems are not More recently, other authors have suggested that the bilingual's language systems are, to varying extents, separated from birth (Genesee, 1989;Werker & Byers-Heinlein, 2008). Genesee asserts that children can differentiate pragmatically between linguistic systems from the onset of one-and two-word utterances (approximately 13-18 months old), indicating that they already have two established linguistic systems. Language mixing, they argue, is the result of other factors, such as lack of lexical knowledge or domain-specificity (1989). A longitudinal study on the simultaneous acquisition of English and German supports this analysis, as the authors demonstrate that "Hannah," the subject of their research, regularly exploits the syntactic features of one language or another in order to maximize her communicative potential (Gawlitzek-Maiwald & Tracy, 1996). Byers-Heinlein takes a more conservative approach, proposing that language differentiation may occur in two distinct stages. In her model, perceptual categorization, under which sensitivities to rhythm, prosody, phonetics and other observable acoustic differences may fall, occurs rather early in development; at the perceptual level, then, languages may be differentiated from birth. Conceptual categorization, on the other hand, requires metalinguistic knowledge and an abstract understanding of language as a whole, which develop later in life through experience and explicit instruction (2014).
Of course, acoustic cues do not exist in a vacuum. In all likelihood, the mind exploits all possible avenues in order to maximize the efficiency with which a given utterance is processed. The wider linguistic context may also contribute to the prediction that a given item belongs to Language A, rather than Language B -for example, if the utterance up until this point has been in Spanish, there is little reason for the listener to anticipate that the next word will be English.
In summary, prior research on and models of bilingual phonology acquisition illustrate that, even if not completely differentiated from birth, children acquiring multiple languages are able to distinguish at least the salient acoustic features of each of their languages seemingly from the onset of productive abilities, if not earlier (Byers-Heinlein, 2014;Deuchar & Clark, 1996;Eimas et al., 1971;Fabiano-Smith & Barlow, 2010;Friedrich & Friederici, 2005;Genesee & Nicoladis, 1997). According to interactive models of speech processing, both systems are available at any given time to process input, allowing access to the lexical and phonological specifications of either language.

Pragmatics and Codemixing
Muysken (2000) distinguishes between codemixing and codeswitching in the introductory chapter of his book, stating that switching "suggests something like alternation (as opposed to insertion), and it separates code-mixing too strongly from the phenomena of borrowing and interface" (4). The processes are distinct in that insertion, or codemixing, involves embedding a constituent of one language (often a single lexical item, or a single phrase) into the overall structure of another language; alternation, on the other hand, denotes a switch between the lexical and grammatical structure of one language to that of another (Muysken, 3-9). In this paper, we will adopt Muysken's definition of "codemixing," denoting insertion of a phrase or lexical item of one language into the structure of another, as in We visited Mom'sueblo in Mexico.
One of the primary sources of evidence for the unitary system hypothesis -the idea that bilingual children initially begin with a single unified linguistic system -is the frequency of codemixing in child speech. From the onset of the two-word utterance stage, children seem to codemix with varying frequency, depending on factors such as the interlocutor's rate of codemixing (Comeau et al., 2003), lexical status (Deuchar, 1999), and parental codemixing (Paradis et al., 2000).
Studies on the development of pragmatic differentiation have demonstrated that bilingual children as young as 1;4 have the ability to "match" their language with that of their interlocutor (Genesee & Nicoladis, 1997;Klapicová, 2016;Petitto et al., 2001), thus supporting the theory that codemixing is not due to a lack of language differentiation, but rather more complex psycholinguistic processes. Genesee and Nicoladis examined a number of input-and child-based explanations for early codemixing behaviors and concluded that neither is sufficient on its own to account for the data; rather, a combination of factors appears to be at work (Genesee & Nicoladis, 1997). Child-based explanations for codemixing attribute this to individual variables such as lexical knowledge and domain-specificity, asserting that codemixing behavior occurs when the child lacks a translation equivalent in the target language, or feels more comfortable discussing a specific topic in the other language, despite having already established a different pattern of language use with the interlocutor (Klapicová, 2016). Proponents of this model often note that children codemix more frequently when speaking a nondominant language, indicating that it is being used to supplement their linguistic expression in L2.
Alternately, some authors argue that input patterns elicit codemixing behavior; in other words, children codemix in proportion with their interlocutor(s) (Comeau et al., 2003). Supporting evidence for this notion is limited, and correlations between parent and child codemixing appear to vary across communities and even families. Social responses to codemixing have also been suggested as behavioral reinforcers, either encouraging or discouraging children's repeated use of mixed languages in their dayto-day communications (Genesee & Nicoladis, 1997).
Children's ability to produce translation equivalents (TE), for instance, has been presented as evidence that early vocabulary selection is not constrained by notions of mutual exclusivity, which is thought to dictate monolingual vocabulary acquisition (Byers-Heinlein, 2014;Mishina, 1997;Nicoladis, 1998;Quay, 1993). Codemixing also appears to serve as a form of "bilingual bootstrapping," allowing bilingual children to combine their linguistic resources in order to maximize their expressive abilities (Gawlitzek-Maiwald & Tracy, 1996). The rate of lexical growth in each language for bilinguals is also comparable with that of monolinguals, indicating that the acquisition device is able to handle two languages simultaneously without sacrificing progress in either (Petitto et al., 2001). These facts further support the notion that even early codemixing is done with a communicative purpose and is not the product of linguistic confusion.
Regardless of whether they themselves codemix, bilingual children are likely to encounter some form of codemixing in the primary linguistic data, as well, especially if raised in a bilingual community.
If they attend a bilingual school, they are also likely to encounter some form of codemixing either as a pedagogical strategy or a communicative practice. Since even elementary school children are still in the process of fully acquiring their native languages, it is worth considering whether they are readily able to recognize codemixed utterances, particularly when they involve unfamiliar lexical items.

Vocabulary Acquisition
Children do not acquire the majority of their vocabulary through explicit instruction; rather, they encounter novel words and phrases in context as they go about their day-to-day routine; unfamiliar words may be interpreted and assigned an approximate meaning based on syntactic, pragmatic, and semantic cues, as well as the nonlinguistic context in which the item occurs (Baker et al. 1995). For childhood bilinguals, an additional consideration must be undertaken when decoding a novel lexical item: assigning it to one of their languages. In addition to pragmatic cues, phonotactic and phonological cues can assist bilinguals in categorizing unfamiliar items linguistically (Caramazza et al., 1974).
Phonotactic probability and token frequency contributes to lexical organization and recall abilities in both bilingual and monolingual preschoolers, according to findings by Messer et al. In a nonword recall task, lexical items which had a high phonotactic probability in Dutch were remembered more frequently by Dutch monolingual and bilingual kindergarteners, while low phonotactic probability items were less likely to be remembered. This indicates that both the frequency of sound structures and Published by Scholarship@Western, 2020 50 the capacity to remember those structures is an important factor in both L1 and L2 vocabulary acquisition (Messer et al., 2010).
Contemporary models of the bilingual lexicon are largely comparable to those of the phonological system(s), focusing primarily on the extent to which bilingual adults separate their languages. Macnamara's (1967) original hypothesis, which regarded the two lexicons as completely distinct entities which could be triggered or suppressed based on context (Macnamara, 1967) is now generally regarded as too reductive, albeit partially correct. Kroll and Stewart built off this idea, amending that words may be stored separately according to language constituency, but reference a single, universal inventory of conceptual representations (1994). More recently, several authors have proposed more detailed alternatives to Macnamara's model in order to understand the bilingual lexicon and the interactions between language systems in the bilingual brain. The Bilingual Interactive Activation Plus (BIA+) model asserts that words are represented in a common inventory with a language "tag" that indicates to which language a given item belongs. When a new lexical item is presented to the bilingual speaker, salient features of the word (e.g. orthography/phonetics, phonology, and/or semantics) activate one language "node" while inhibiting the other, thereby progressively constraining the hypothesis space as more information is acquired (Dijkstra & van Heuven, 2002). Similarly, the Inhibitory Control (IC) model anticipates that bilinguals must simultaneously activate output from one language and inhibit output from the other. This model, however, emphasizes the language task schema, which regulates linguistic output depending on the demands of a particular task or other input (Green, 1998). A notable issue with such models is that they focus on explaining language activation and processing strategies for individual lexical items, meanwhile failing to account for input from sentence-level characteristics such as syntax, rhythm/intonation, etc. However, perceptual cues from the utterance as a whole, particularly prosodic patterns, can in fact help bilingual infants distinguish other aspects of their language such as word order (Gervain & Werker, 2013). If models such as the IC and BIA+ can be applied to young bilinguals as well as adults, then the question becomes whether cues from different domains (i.e. syntax, phonetics, pragmatics, etc.) are equally effective for activating language-specific nodes.

The present study
This study seeks to determine whether phonological or sentence-level cues take precedence for young bilinguals when processing an unfamiliar lexical item and assigning it to one language or the other, especially when these cues may contradict one another. In other words, we want to know how codemixed utterances influence children's acquisition of constituent lexical items. The primary research questions are: 1. Are bilingual children able to recognize an utterance as codemixed on the basis of incongruent phonetic cues? (e.g. if a word that "sounds" Spanish is inserted into an otherwise-English utterance) 2. Does introducing a novel lexical item as the inserted constituent in a codemixed utterance affect the language to which bilingual children assign it?
Bilingual children may establish phonemic and phonotactic inventories for each of their languages, as early as from birth or as late as two years old. After this point, according to the BIA+ model, they should be able to correctly categorize unfamiliar lexical items based on acoustic properties, such as the presence of certain Spanish-or English-exclusive features (i.e. lack of complex codas in Spanish or additional vowel phonemes in English). For shared phonemes, different thresholds for categorical perception exist across English and Spanish (e.g. the VOT contrast for stops), providing further salient evidence of whether a given word can be considered Spanish or English. In the case of codemixed utterances, cues from the sentence as a whole may complicate the matter. Bilingual children are more often exposed to monolingual utterances and therefore may be experientially biased against the notion of codemixing, as often the most general, sentence-level cues may be enough to reliably distinguish one language from another. These findings lead to the generation of two hypotheses: Published by Scholarship@Western, 2020 52 H1: Because bilingual children are able to distinguish even minor phonetic differences between English and Spanish prior to their second birthday (Deuchar & Clark, 1996;Werker & Byers-Heinlein, 2008), the participants in this study will be attuned to phonetic information in the input and can use this to linguistically categorize individual constituents of an utterance, even when conflicting with global cues e.g. the rest of the sentence. They will therefore categorize unknown lexical items with reference to what is phonologically permissible in each language, even in the face of conflicting pragmatic cues.
Otherwise, they will resort to other cues, e.g. the wider context ("frame") in which the item appears.
H2: Contextual cues, determined by the utterance as a whole rather than individual constituents, are more salient for young bilingual children as they conform not only to the phonology, but to the syntax and (for the most part) lexical inventory of one language (Connine & Clifton, 1987). Therefore, bilingual children will prioritize contextual cues when asked to assign new lexical items to a language.
H0: Two-to six-year-old bilingual children have neither differentiated lexical nor phonological systems, and therefore are unable to assign a given item to one language or the other.

Methods and Design
Participants English-dominant bilingual children (N = 10, NMale = 2) between the ages of three years, six months (3;6) and six years, eleven months (6;11) old were recruited in Holyoke, Massachusetts to participate. By two years old, typically-developing bilingual children have acquired the phonological systems of both their native languages (Werker & Byers-Heinlein, 2008) and will communicate in the language consistent with the input (Genesee & Nicoladis, 1997;Klapicová, 2016;Petitto et al., 2001).
This demonstrates their awareness of two critical precursors to the knowledge tested in this study: first, these children are aware of the sound patterns of each language and can alternate between linguistic systems in response to external stimuli (e.g. the interlocutor's language). Holyoke was selected for recruiting due to its proximity to the University of Massachusetts, as well as its high concentration of Spanish-speaking families.
Although several of these children received Spanish exposure both at home and in school, all of them were determined to be English-dominant bilinguals based on their guardians' responses to a Linguistic Background Questionnaire. The questionnaire was developed from a universal L2 assessment interface and adapted for caregivers to fill out regarding their children (rather than self-assessed, as in the original template) (Li et al., 2006). When asked directly at the beginning of the task, participants also said they preferred to communicate in English. Prior to beginning the task, each child answered questions such as what language do you speak with [caregiver], what language do you speak at school, and what language are we speaking now to determine whether they generally were familiar with the terms English and Spanish. All ten participants were able to answer these questions correctly according to the information provided by their caregivers on the Linguistic Background Questionnaire, indicating familiarity with the terms English and Spanish and the linguistic systems to which they referred.

Stimuli
Fifteen (15) nonwords (see table 1) were created for this experiment according to the phonotactics and phonemic inventories of English and Spanish. This was done by removing segments (i.e. individual phonemes or a sequence of phonemes) from real English and Spanish words and replacing them with common segments in the target language, or shared segments in the case of the ambiguous words. Five (5) of these were phonetically ambiguous, meaning they contained shared phonemes and did not directly violate phonotactics of either language; five (5) were well-formed in English only; and five (5) were well-formed in Spanish only. Phonotactic probability scores (PPS) were calculated using an online tool from the University of Kansas, for which both an English and a Spanish corpus is available.
The program bases its PPS on biphone and positional segment frequency, with the sum of all phoneme and biphone probabilities comprising the final score (Vitevitch & Luce, 2004). The inclusion range for Published by Scholarship@Western, 2020 54 English nonwords was a phonotactic probability score between 1.163 and 1.263; the inclusion range for Spanish was between 1.315 and 1.415. Separate inclusion criteria were selected as Spanish has a smaller phonemic inventory (n ≈ 24) (Mackenzie, 1999(Mackenzie, -2017 than English (n ≈ 44) (Edwards, 2002), and will therefore trend toward higher PPS in general.

Table 1. Nonwords and their phonotactic probability scores (PPS) in English and Spanish
Note. Items whose PPS = 0 are those which contain an illegal phoneme or are phonotactically prohibited by one language.
Each of these nonwords was inserted into an English or Spanish "frame" sentence, such as "His cat loves ___ a lot," or "Los cerdos y los ___ están comiendo" (= "The pigs and the ___ are eating").
Two different sets, each containing the same nonwords and frame sentences but in different combinations, were generated to maximize reliability, so that the same nonword might appear in an English sentence in set A, and a Spanish sentence in set B. Each participant was only tested on one set.
Each of the target sentence + nonword combinations was recorded in Praat and inserted into a corresponding illustration using an iPad application, Explain Everything.

Task
This task was designed to investigate whether bilingual children can consciously categorize a novel item as belonging to one language when it is introduced in the context of another (i.e. a Spanish word used in an English sentence, or an English word used in a Spanish sentence). Guardians were asked to consent on behalf of their children and to fill out a short Linguistic Background Questionnaire prior to data collection. This questionnaire was used to establish the child's dominant language and gauge their level of development in both English and Spanish.
The experimental task, presented on an iPad using Explain Everything, consisted of fifteen sentences paired with corresponding illustrations. Due to the odd number of nonwords, the English: Spanish sentence ratio was 8:7 for set A, and 7:8 for set B. Each sentence contained one nonword used as a noun in either subject or direct object position. Children were asked to verbally assent prior to beginning the task. For each nonword + sentence pair, the child was shown a simple illustration on the iPad and asked to listen to a recording of the sentence containing the nonword, then prompted to figure out the name of the creature depicted in the image, for example: If the child was unsure, the sentence would be played again. At this point, if the child still could not identify the nonword or the item to which it might refer, the researcher would point out the item and say, "In the story, they called it __. Can you say that?" and then, regardless of productive accuracy: "Do you think that's his name in English or Spanish?" The experiment was audio recorded via the iPad. An optional success break, during which children could choose the first of two small prizes, was included to promote motivation to finish the task. At the end, children were given their second prize or, if the success break was not utilized, both prizes at the same time. Altogether, the task took around 15-20 minutes to complete.

Hypothesized outcomes
For ease of reference, the conditions tested are reflected in Published by Scholarship@Western, 2020 56 of the conditions outlined below. Likely, younger participants will rely more heavily on contextual cues and will judge individual lexical items with reference to the utterance as a whole. Non-codemixed utterances, in which all cues give the same indication with regards to language categorization (i.e. conditions A and D) will likely have the highest proportion of English-or Spanish-favoring judgments, regardless of the accepted hypothesis. H2 (whole-sentence > phonology): If children are primarily cued by overall context in which the nonword appears, conditions in which the nonword is presented in an English frame (A, B, and C) would have a higher proportion of English judgments. The proportion of English judgments may decrease when the target is a Spanish-apparent nonword (as in B), and an increase when the target is English-apparent (as in A), but still generally favor the context over contradictory phonetics. For ambiguous items (C and F), children will be cued by the frame sentence, differing minimally from outcomes in the noncodemixed conditions.
Both hypotheses anticipate that cues from the non-dominant domain (i.e. pragmatics for H1 and phonology from H2) may impact the judgments in cases where the phonology and pragmatic cues conflict with one another; however, one cue will have more sway in the ultimate judgment. English-apparent nonword), indicating that there is perhaps another system at work outside of the anticipated phonological and contextual decoding when confronted with an unfamiliar lexeme. Notably, introducing a nonword in an English sentence did appear to influence children's decision, as demonstrated by the decreased frequency of Spanish-sorting judgments in all the conditions in which an

Summary of the Data
English frame was used. There was no significant difference between judgements from the oldest and youngest participants. Note. English-apparent words in Spanish frames and Spanish-apparent words in English frames. Bolded entries are Englishapparent.
Note. Bolded entries are English-apparent.
In non-codemixed utterances (e.g. "I put a / uni/ on the table") using an English frame, most child participants concluded that the nonword was English, as expected. Likewise, when presented with a Spanish-apparent nonword in a Spanish frame, participants judged the nonword as Spanish. In "codemixed" utterances (e.g. "La pelota está en / uni/"), however, there emerged a bias toward categorizing nonwords as Spanish, even when the sentential context and phonology indicated otherwise.
While Spanish-apparent words in an English frame were categorized as Spanish 62.5% of the time, 91.3% of English-apparent words in a Spanish frame were also judged as Spanish. These results reveal that the frame sentence, while influential in lexeme categorization decisions, is not the only contributor to these judgments. There appears to be a fundamental bias toward assuming novel vocabulary items are Spanish. Results were coded according to the language of the frame, the apparent phonology of the nonword, and the judgment given by each participant. Spanish frames and Spanish-apparent items were coded as 0, while English frames and English-apparent items were coded as 1. The apparent phonology for the five ambiguous nonwords were coded as 0.5. A binomial logistic regression analysis was conducted using Python, which revealed a correlation coefficient of 0.47 relating the frame sentence and final judgment, and of 0.200208 relating the phonology and the final judgment (R 2 = 0.24, y-intercept = -2.06). The odds ratio for the sentence was 5.00, meaning that the likelihood that a participant would judge a nonword as English increased by a factor of 5 when the language of the frame sentence was also English. The odds ratio for the phonology of the nonword, on the other hand, was 2.04. The model was given an accuracy score of 0.86, or 86%. See appendix I to view the code used in this analysis.

Discussion & Conclusions
Unexpectedly, in five out of the six test conditions, the majority of judgments categorized the nonword as Spanish. There appears to be more congruity between columns than rows in table 3, and this finding is supported by the statistical analysis, which correlated the frame sentence language with the ultimate judgment with a coefficient of 0.47. Together, these findings indicate that the frame sentence had more impact on judgment outcomes than the phonology of the nonword. This aligns with the outcome anticipated by H2, which anticipated that the overall context in which the nonword was presented would trump even contradictory phonetic cues. The proportions of English: Spanish judgments do not align at all with those predicted by H1, indicating that this hypothesis should be rejected. H2 predicted that conditions A, B, and C would have a greater number of English judgments, which is indeed the case. For nonwords appearing in Spanish frames, the greatest number of Spanish judgments (94.4%) occurred when the item and its frame corresponded, with near-equivalent numbers in the other two Spanish sentence conditions.
It appears that up to six years old, many bilingual children will still categorize unfamiliar lexical items based on the context in which the word appears, and not on the individual item's phonological features -even in cases where language-specific phonotactics are violated. English frame + phonology pairs received the highest proportion of English judgments, followed by both Spanish-apparent and phonologically-ambiguous words presented in English sentences. Similarly for Spanish-favoring judgments, the condition which yielded the greatest proportion was the non-contradictory condition, in which the Spanish-apparent items appeared in Spanish sentences. Both English-apparent and phonetically ambiguous nonwords in Spanish sentences were judged as Spanish in over 90% of the cases.
The overall context (i.e. frame sentence) in which a nonword appeared was found to be the better predictor of judgment outcomes. While context is not the only contributing factor in assignment of a new item to one language or another, it does appear to play a role.
Although in many cases, phonological structure did not influence children enough to contradict the language of the frame sentence in their judgments, participants' comments on a number of the Spanish-apparent words in English sentences (condition B) demonstrated that their knowledge of Spanish phonology was indeed a consideration when making these judgments. When presented with condition B stimuli, several children remarked openly that the nonword "sounded different" than the rest of the utterance. In almost 65% of cases, the nonword was judged as Spanish despite the overall linguistic context indicating otherwise. In other words, children were able to ignore global cues in favor of itemspecific cues when these cues tapped their knowledge of Spanish phonology. The same cannot be said, however, for the corresponding condition D with English-apparent words in Spanish sentences; 91.3% of nonwords in this condition were judged according to the frame sentence rather than the phonetics.
Perhaps most tellingly, judgements for phonetically ambiguous nonwords heavily favored Spanish, even when the word was presented in an English frame. In light of the fact that all participants were English-dominant bilinguals, this finding, along with participants' judgments in the "codemixed" conditions B and D, may be attributed to a bias toward expanding the lexicon of the non-dominant language, perhaps due to a probabilistic understanding that unknown items are more likely to belong to the less replete lexicon. In other words, as all the children tested here were English-dominant bilinguals, it could be assumed that their Spanish lexicon contains more "gaps" to be filled in. Several participants hinted (or, in one case, directly stated) that this was a consideration: for example, after much deliberation, when asked why they decided that one English-apparent item was a Spanish word, one child explained,

"Because I know what that is in English."
This explanation is further supported by the statistical analysis, in which a y-intercept of -2.06 was indicated; this may be interpreted as a high degree of bias toward Spanish judgments. The ratio of English:Spanish judgments for nonwords presented in "codemixed" utterances was 11:36, although there were equal numbers of English-and Spanish-favoring stimuli. Depending on the frequency with which each child was exposed to codemixing in daily life, this apparent bias could be the result of frequent exposure to Spanish items inserted into English structures (a common practice in some bilingual communities), but not vice versa. It could also be a simple matter of statistical learning; being Englishdominant bilinguals, all the participants had a larger English vocabulary, and were judged by their caregivers as more comfortable expressing themselves in English. It is conceivable that, having a less densely-populated Spanish lexicon, these young bilinguals know -either consciously or unconsciously Four out of five of the English-apparent nonwords (/skwin/, /b le /, /d ines/, and /ned uli/), when introduced in an English sentence, were judged as English by all participants, indicating that while participants may be inclined to believe that an unfamiliar lexical item belongs to the language for which their personal lexicon is less robust, this does not cause them to entirely discount the possibility of unfamiliar items in the dominant language. In these cases, it appears that the combination of evidence from both phonetic and contextual domains allows children to override their bias and judge the novel items as English. This analysis is further supported by the fact that, even when they were presented in an English context, many participants still categorized the ambiguous nonwords as Spanish. So, it appears that this bias toward Spanish is considerable, and requires evidence from both the phonetics and frame sentence to be rejected: for these children, whose Spanish lexicon is more sparsely populated than Based on the results of this task, it appears that both the sentence as a whole and phonology of the target item must indicate an English word in order for it to be judged as such. This reveals an important aspect of the way that bilingual children interpret novel items when said items are used in a codemixed utterances. It appears that, with regard to language differentiation, most bilingual children as old as six still prefer a top-down analysis of a given utterance and focus primarily on the context in which a word appears rather than features of the word itself to determine whether it should be categorized as Language A or Language B. While this strategy may be productive for the vast majority of vocabulary acquisition, such a high success rate in normal cases may reinforce misconceptions in a situation where it is used to assess a word that appears in a codemixed context.
Perhaps more importantly, these results may indicate an important strategy for simultaneously acquiring two lexical systems, wherein the child is mandated to "round out" the weaker lexicon by preemptively assigning novel words to the less-verbose language. If this is true, then inverted results should be observable in Spanish-dominant bilinguals given the same tasks; this would be an excellent undertaking for future iterations of this study. Accuracy of the statistical model could also be increased by expanding the pool of participants and including an equal number of Spanish-and English-dominant bilinguals. Additional attention should also be paid to the gender makeup of participants, as this also has an effect on vocabulary acquisition (Huttenlocher et al., 1991). As this study was conducted with predominantly female participants, and due to the small sample size, this aspect of development cannot be sufficiently addressed at present.
Because children as old as six did not appear swayed by phonetic cues or phonotactic violations in codemixed utterances, it would be prudent to include older participants in the future, to determine at what point in development bilingual children are able to make adult-like judgments of words introduced in codemixed utterances. Conversely, in order to be more accessible to younger participants, future iterations of this study may do well to employ eye-tracking or physical representations of the Spanish-English categorization so that children do not necessarily need to be verbal in order to complete the task.
For these children, a baseline criterion for metalinguistic awareness should also be used to determine whether they understand the significance of abstract terms such as "English" or "Spanish." Another option would be to provide two distinct nonwords per frame and ask the child to determine which one fits the rest of the sentence -indirectly testing their ability to perceive phonological similarities/differences between a discrete lexical item and the language to which it might belong.
Finally, based on the results of this study, one may predict that Spanish-dominant bilinguals would be similarly biased toward assuming that unfamiliar items are English; future investigations should test this hypothesis as well.
The results of this investigation indicate that children are aware of both word-and sentence-level cues as to what language is being spoken; however, all else equal, they are more likely to rely on features of the sentence as a whole when deciding which language to assign its constituents. The pooled results from children's judgments on nonwords presented in English and Spanish sentences leaned heavily toward Spanish in all conditions except the English sentence + English phonetics condition, and I argue that this may be attributed to a strategy for enhancing the less-developed language's lexicon. As all participants in the study were English-dominant bilinguals, the skew toward categorizing unfamiliar words as Spanish reflects underlying knowledge that their Spanish vocabulary is less robust, and that they are thus statistically more likely to encounter an unknown Spanish word than an unknown English word. However, given sufficient evidence, they also accept the possibility for an unknown item to belong to their dominant language -but what evidence is sufficient? In this case, the language of the frame sentence seemed to impact judgments more greatly than the phonotactics and phonemic constituents of the nonword itself. This is a noteworthy finding for both parents and educators of bilingual children, and could potentially be useful when designing curriculum targeted at multilingual communities. This is not to say that codemixing breaks down or disrupts cognitive barriers between languages; however, confounding input such as codemixed utterances (where phonological and contextual cues are contradictory or do not provide sufficient evidence for the child to accept a codemixed reading) may result in miscommunications or missed information. Now that the top-down decoding strategy and bias toward supplementing the weaker language have been identified, future studies may do well to explore the impact that codemixing -and the particular way that bilingual children decode mixed utteranceshas on listening comprehension and vocabulary acquisition.