Home » Posts tagged 'metadata'
Tag Archives: metadata
Problem? Check out the OLS Knowledge Base or open a ticket by emailing [email protected].
Restricted search collections in OneSearch
2016-03-24 / 1 Comment on Restricted search collections in OneSearch
The following list of vendors restricts access to their collections (even to just the metadata, the information that is displayed in the search results) unless the user is on-campus or signed into OneSearch:
- ProQuest
- Scopus
- Web of Science
- ArtStor
- Bureau of National Affairs (BNA)
- RMIT Publishing
- American Geoscience Institute
- Henrietta Szold Institute
- Index to Hebrew Periodicals
For the average user of OneSearch, this means that results from these vendors will not show up in his/her searches unless the user is on campus or signed into OneSearch. To circumvent this problem, the Office of Library Services has included a message at the top of the results list to prompt users to sign in if they are not already logged in or if they’re visiting from a non-campus-affiliated IP address:
(For more information about this prompt, please see the entry we wrote when we implemented this feature in July 2015: “Off campus?” prompt now appears only off-campus.)
Depending on the search, the user can expect to see a handful of extra results or thousands of additional records for his/her query! For this reason, everyone is encouraged to sign into OneSearch to get all available results.
Which MARC Fields Are Used in OneSearch?
In an earlier blog post, we discussed the structure of OneSearch’s metadata, aka PNX. (See: Behind OneSearch: Part 1 – Internal Records (PNX).)
If you’ve wondered exactly which MARC fields are being used to create each of those PNX fields, we’ve got good news for you! We extracted the data and are making it available for your review. The resulting spreadsheet (CUNY-ALEPH-Norm-Rules-2015-12.xlsx) contains a list of all fields used by the OneSearch normalization process which creates the PNX records. The data was retrieved at the beginning of December 2015.
Now you’ll know that the 856 field, for example, is being used in the following PNX sections:
- delivery/delcategory
- links/addlink
- links/linktoreview
- links/linktorsrc
- links/linktotoc
The first tab in the spreadsheet, MARC, contains all the MARC fields in use. The second tab, non-MARC, contains fields which do not use Aleph MARC as the source. The Non-MARC tab includes fields containing static text as well as fields generated from system data or other PNX fields.
Each column of spreadsheet data has a filter applied, so, for example, you can click on the drop-down box in the “MARC” column of the MARC tab and select ‘008’ only. This will display only those PNX fields which use the MARC 008 field. The other columns in the MARC tab are:
- Ind1 (1st indicator)
- Ind2 (2nd indicator)
- MARC Subfields
- PNX Section
- PNX Subsection
A minus (“-“) in the Ind1, Ind2, or MARC Subfields columns indicates that these values are omitted from use. You can think of it as the “NOT” Boolean operator. For example, the 500 field is used in the PNX display section but its “5” and “6” subfields ($$5, $$6) are omitted (MARC Subfields = “-56”).
If a number appears in the Ind1 or Ind2 field, only that specific indicator is used. If no number appears in these fields, all indicators are included.
An asterisk in the MARC Subfields column indicates that all subfields are used.
Download the spreadsheet here: CUNY-ALEPH-Norm-Rules-2015-12.xlsx
OneSearch, E-Resources, and Dedup
2016-01-26 / 1 Comment on OneSearch, E-Resources, and Dedup
Over the winter break, OLS applied a major improvement to OneSearch results: we “de-deduped” many e-resource records!
The Problem
As OneSearch was initially configured, it applied its dedup process to all records equally, whether they were print or electronic.
What is dedup? The dedup process allows OneSearch to present a single result when the identical resource is held by multiple schools. Multiple records are merged into one display record. This is useful in the case of print resources, allowing the system to offer all CUNY locations to the user in one result.
However, this proved a major problem for e-resource records as the merged data includes the MARC 856 fields (which often provide links to electronic resources). OneSearch uses these fields to create a “View Online” button which links to full-text whenever the underlying metadata indicates this is the purpose of the field (see Why Does OneSearch Say We Have the Electronic Book?).
Deduped e-resource records with multiple 856 fields included in the merged records made it difficult for OneSearch to determine the correct availability for many deduped records. This resulted in the dreaded Full text may be available, see “Details” for links availability message with its accompanying grey dot and lack of a “View Online” button:
The Solution
E-resource records whose links require proxying are being prevented from matching other records for the purpose of deduping. This allows each local view to accurately identify the availability status of its e-resource:
The solution was only applied to records which require proxying on the assumption that e-resources that do not require proxying are freely available to everyone and should be shared with everyone.
The Caveat
Please note that this solution works best in the local school views. A version of the problem still exists for many records in the CUNY view. Instead of a single, deduped record, the CUNY view now displays a group of records (as they are now FRBRized instead of deduped):
Clicking on the title will take the user to a list of individual e-resource records. Records which are only available via a local library will still show the Full text may be available message in the CUNY view.
If you run across any problems, please be sure to report them to OLS by opening a work order with the CUNY Service Desk.
OneSearch vs Aleph as an E-resource Source
Last week’s post about link checking describes one of the ways that OLS is working on improving access to e-resources via OneSearch.
Another area of work regards e-resource records batchloaded to Aleph which are also available via the Primo Central Index (PCI). Because of the large number of duplicate records coming from Aleph, there are often problems determining availability correctly for these batchloaded records.
To avoid these problem, when OLS finds that PCI records are duplicated in Aleph, we mark the Aleph records in order to prevent transfer of these records to OneSearch.
Note: OLS is not removing records from Aleph, we are simply not using those records in OneSearch.
Additional benefits of this approach:
- Efficiency. Duplication doesn’t only happen with PCI. Sometimes the same records are loaded to Aleph for multiple libraries. Any library using OneSearch as their discovery tool can choose to stop running these duplicate batchloads.
- Faster turnaround with updates coming to PCI directly from the publisher.
- Reduced potential for error and out-dated content.
How do I know which Aleph records are being blocked?
means we don’t need this record in OneSearch. These records can be found in Aleph via a CCL search (where “xx” is your library code):
wst=pci and wow=xx
What if there is more metadata in the Aleph record than in the PCI record?
Many PCI records come with searchable full-text — for example, in addition to the metadata, the entire text of an article could be searchable. Although it is possible that PCI records will contain less metadata than Aleph records, in such cases, we believe that the searchable full-text is a more than ample substitute. On the OLS Support Site, the following information is available:
- PCI collection descriptions including collection name, vendor, content, coverage, if full-text is searchable
- Each library’s PCI collection activations
How do we know if specific e-resources are available in PCI in order to decide what content is duplicated?
PCI makes activated collections searchable (aka “discoverable”). Full-text access for CUNY (aka “Full text Available” in OneSearch) happens because the resource is also activated in SFX. Many PCI collections can be searched for free, even when delivery of full-text is only available to subscribers. See the Primo, PCI, and SFX blog post for more information on this relationship.
At CUNY, we have activated all free-to-search PCI collections in our testing (“Sandbox”) version of OneSearch. When we want to check if specific e-resources are available via PCI, we can search for them in the Sandbox and, after the initial search result is displayed, we click on the “Include results without full text online” checkbox to include content to which we do not have full-text access.
Aleph e-resource records that are not found in PCI will continue to be sent to OneSearch.
Sandbox Caveat: The Sandbox does not include all CUNY content. It is restricted to 100K (local) records so it only contains a small percentage of our Aleph content.
Why does OneSearch sometimes say “No full-text” when there is a working link to full-text in the “Details” tab?
Distinguishing what is really available as full-text is challenging. To minimize user frustration, it is considered preferable to under-promise and over-deliver rather than to over-promise and under-deliver. In other words, the system errs on the side of false negatives (“No full-text” displayed even though full text is available), instead of false positives (“Full text available” displayed even though full text is not available).
For this reason, it can be helpful to remember that while “Full text available” is generally reliable, “No full-text” is less reliable. It is usually worthwhile to click on the “Details” tab of a “No full-text” result, just in case it contains some link that will get you to full-text.
Example: HathiTrust provides no indication of whether its content is open access or not, forcing its users to choose between false negatives and false positives. In response, we at CUNY have chosen to indicate that all HathiTrust content is “No full-text” even though many records will contain a link to full-text in the “Details” tab.
Are e-books showing up in Google Scholar? Many faculty will go to Google Scholar as their discovery tool and bypass the library.
E-resources activated in SFX show as available at CUNY in Google Scholar. This means that a side effect of this project is more availability information being shared with Google Scholar, inadvertently leading to a better user experience in Google Scholar. Please be aware, however, that Google Scholar shares little about how and where it gets the information it provides.
What are e-resources?
“E-resources” refers to any type of content that is available in electronic or digital form. This includes articles, books, videos, and more.
Looking for Link Errors in Aleph Catalog Records
The Office of Library Services is pleased to report that we have completed a thorough review of 856 (URL) fields in our catalog. On Thursday, November 5, we emailed the Cataloging Committee members explaining how to access their school’s error list in the Aleph task manager and providing details about the results.
Project Goals
- a school-specific report listing all the problem fields
- a method for temporarily removing these broken URLs from OneSearch
- an easy method for catalogers to update Aleph and OneSearch once the 856 fields are corrected
- an easy method for catalogers to obtain updated lists of any remaining broken 856 fields
To that end, suspect 856s fields have been marked in the Aleph cataloging record with an 856 sub-field ($zBroken Link). For example:
85641$ahttp://state.tn.us/correction$zBroken Link
Until the link is fixed and the subfield is removed, the URL will not appear in OneSearch.
When the cataloger corrects the link in the 856 field, they will also remove the Broken Link subfield and, later that day or the next day (depending on the time of the update), the link will reappear in OneSearch.
While we have provided a list of broken links for each campus, any Aleph power user can easily get an updated list via an Aleph CCL (Common Command Language) search. For example, this search will output all broken links owned by CUNY Central (OWN=AL):
WUR="broken link" AND WOW=AL
To help you get started, we are recommending that catalogers prioritize the error list in the following way?s
- Troubleshoot e-resource-only records first (before mixed print+electronic). If these don’t work, the records have no reason to exist.
- Identify 856s in old records (government docs and books) that point to non-full-text content (like table of contents). Consider removing those without assessment.
- Identify 856s for online versions of print material. Update if possible, otherwise remove the links (PURLs often don’t work).
Additional Notes
In September, we wrote about new error checks for the 856 field that had been added in the Aleph GUI. These error checks allow the cataloger to open a record, press CTRL-U (Check Record), and get an error report showing the type of error. Clicking on that error will take them directly to the problematic 856 field.
That September blog entry also lists all the MARC fields which can contain a URL. At the beginning of this project, we reviewed the content of all those other URL fields (505, 506, etc) and found that their use was extremely rare at CUNY. For that reason, we only ran this URL checking process on 856 fields.
Since we have added the GUI-based URL checks, we do not anticipate running this process more than once a year.
M | T | W | T | F | S | S |
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
Recent Posts
- Keyboard shortcuts in Alma 2020-06-03
- Serials Solutions MARC records being removed from Aleph this month 2019-07-09
- Activating CUNY Central collections in OCLC WorldShare Collection Manager 2019-07-03
- OneSearch | Highlights of May 2019 update 2019-06-06
- New workflow for OneSearch bug reports 2019-04-05